{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "using VMLS\n",
    "using LinearAlgebra"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chapter 14\n",
    "# Least squares classification\n",
    "### 14.1 Classification\n",
    "**Boolean values.** Julia has the Boolean values `true` and `false`. These are automatically converted to the numbers `1` and `0` when they combined in numerical expressions. In VMLS we use the encoding (for classifiers) where `True` corresponds to `+1` and `False` corresponds to `−1`. We can get our encoding from a Julia Boolean value `b` using `2*b-1`, or via the ternary conditional operation `b ? 1 : -1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf2pm1(b) = 2*b-1\n",
    "b = true"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf2pm1(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "false"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b = false"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-1"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf2pm1(b) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3-element Array{Bool,1}:\n",
       "  true\n",
       " false\n",
       "  true"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b = [ true, false, true ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3-element Array{Int64,1}:\n",
       "  1\n",
       " -1\n",
       "  1"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf2pm1.(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Confusion matrix.** Let’s see how we would evaluate the prediction errors and\n",
    "confusion matrix, given a set of data `y` and predictions `yhat`, both stored as arrays\n",
    "(vectors) of Boolean values, of length `N`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2×2 Array{Int64,2}:\n",
       " 24  27\n",
       " 23  26"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Count errors and correct predictions\n",
    "Ntp(y,yhat) = sum( (y .== true) .& (yhat .== true) );\n",
    "Nfn(y,yhat) = sum( (y .== true) .& (yhat .== false) );\n",
    "Nfp(y,yhat) = sum( (y .== false) .& (yhat .== true) );\n",
    "Ntn(y,yhat) = sum( (y .== false) .& (yhat .== false) );\n",
    "error_rate(y,yhat) = (Nfn(y,yhat) + Nfp(y,yhat)) / length(y);\n",
    "confusion_matrix(y,yhat) = [ Ntp(y,yhat) Nfn(y,yhat); Nfp(y,yhat) Ntn(y,yhat) ];\n",
    "y = rand(Bool,100); yhat = rand(Bool,100);\n",
    "confusion_matrix(y,yhat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "error_rate(y,yhat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dots that precede `==` and `&` cause them to be evaluated elementwise. When\n",
    "we sum the Boolean vectors, they are converted to integers. In the last section\n",
    "of the code we generate two random Boolean vectors, so we expect the error\n",
    "rate to be around $50%$. In the code above, we compute the error rate from\n",
    "the numbers of false negatives and false positives. A more compact expression\n",
    "for the error rate is `avg(y .!= yhat)`. The VMLS package contains the function\n",
    "`confusion_matrix(y, yhat)`.\n",
    "\n",
    "### 14.2 Least squares classifier\n",
    "We can evaluate $f̂(x) = sign(f̃(x))$ using `ftilde(x)>0`, which returns a Boolean\n",
    "value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "fhat (generic function with 1 method)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ftilde(x) = x'*beta .+ v # Regression model\n",
    "fhat(x) = ftilde(x) > 0 # Regression classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Iris flower classification.** The `Iris` data set contains of $150$ examples of three types of iris flowers. There are $50$ examples of each class. For each example, four features are provided. The following code reads in a dictionary containing three $50 × 4$ matrices `setosa`, `versicolor`, `virginica` with the examples for each class, and then computes a Boolean classifier that distinguishes $Iris Virginica$ from the the other two classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5-element Array{Float64,1}:\n",
       " -2.3905637266512043  \n",
       " -0.0917521691013458  \n",
       "  0.4055367711191057  \n",
       "  0.007975822012793829\n",
       "  1.1035586498675736  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "D = iris_data();\n",
    "# Create 150x4 data matrix\n",
    "iris = vcat(D[\"setosa\"], D[\"versicolor\"], D[\"virginica\"])\n",
    "# y[k] is true (1) if virginica, false (0) otherwise\n",
    "y = [ zeros(Bool, 50); zeros(Bool, 50); ones(Bool, 50) ];\n",
    "A = [ ones(150) iris ]\n",
    "theta = A \\ (2*y .- 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2×2 Array{Int64,2}:\n",
       " 46   4\n",
       "  7  93"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yhat = A*theta .> 0;\n",
    "C = confusion_matrix(y,yhat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.07333333333333333"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "err_rate = (C[1,2] + C[2,1]) / length(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.07333333333333333"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "avg(y .!= yhat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 14.3 Multi-class classifiers\n",
    "**Multi-class error rate and confusion matrix.** The overall error rate is easily evaluated as `avg(y .!= yhat)`. We can form the $K×K$ confusion matrix from a set of $N$ true outcomes $y$ and $N$ predictions `yhat` (each with entries among ${1, . . . ,K}$) by counting the number of times each pair of values occurs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "function confusion_matrix(y, yhat; K=2)\n",
    "C = zeros(K,K)\n",
    "for i in 1:K for j in 1:K\n",
    "    C[i,j] = sum((y .== i) .& (yhat .== j))\n",
    "end end\n",
    "return C\n",
    "end;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4×4 Array{Float64,2}:\n",
       " 6.0  4.0  5.0   8.0\n",
       " 7.0  5.0  7.0  11.0\n",
       " 3.0  9.0  8.0   7.0\n",
       " 3.0  7.0  5.0   5.0"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "error_rate(y, yhat) = avg(y .!= yhat);\n",
    "# test for K=4 on random vectors of length 100\n",
    "K = 4;\n",
    "y = rand(1:K, 100); yhat = rand(1:K, 100);\n",
    "C = confusion_matrix(y, yhat, K=K)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.76, 0.76)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "error_rate(y, yhat), 1-sum(diag(C))/sum(C)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function `confusion_matrix` is included in the `VMLS` package.\n",
    "\n",
    "**Least squares multi-class classifier.** A $K$-class classifier (with regression model) can be expressed as\n",
    "$$\n",
    "f̂(x) = argmax_{k=1,...,K}f̃_k(x),\n",
    "$$\n",
    "where $f̃k(x) = x^Tθ_k$. The $n$-vectors $θ1, . . . , θK$ are the coefficients or parameters in the model. We can express this in matrix-vector notation as\n",
    "$$\n",
    "f̂(x) = argmax(x^TΘ),\n",
    "$$\n",
    "where $Θ = [ θ1 · · · θK ]$ is the $n × K$ matrix of model coefficients, and the argmax of a row vector has the obvious meaning.\n",
    "\n",
    "Let’s see how to express this in Julia. In Julia the function `argmax(u)` finds the index of the largest entry in the row or column vector $u$, i.e., $argmax_k u_k$. To extend this to matrices, we define a function `row_argmax` that returns a vector with, for each row, the index of the largest entry in that row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4×5 Array{Float64,2}:\n",
       "  1.31775   -1.00454    0.288372  -0.543231   0.236552\n",
       "  1.49855    1.16705   -1.19893    1.74655    2.06714 \n",
       "  0.174156   0.568129   0.63655    1.15186   -0.332617\n",
       " -0.651439  -0.317868  -1.20077   -1.47504    0.184936"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "row_argmax(u) = [ argmax(u[i,:]) for i = 1:size(u,1) ]\n",
    "A = randn(4,5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4-element Array{Int64,1}:\n",
       " 1\n",
       " 5\n",
       " 4\n",
       " 5"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "row_argmax(A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If a data set with $N$ examples is stored as an $n × N$ data matrix `X`, and `Theta` is\n",
    "an $n × K$ matrix with the coefficient vectors $θ_k$ as its columns, then we can now\n",
    "define a function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "fhat (generic function with 2 methods)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fhat(X,Theta) = row_argmax(X'*Theta)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "to find the $N$-vector of predictions.\n",
    "\n",
    "**Matrix least squares.** Let’s use least squares to find the coefficient matrix Θ for a\n",
    "multi-class classifier with n features and $K$ classes, from a data set of $N$ examples.\n",
    "We will assume the data is given as an $n × N$ matrix $X$ and an $N$ - vector $y^{cl}$\n",
    "with entries in ${1, . . . ,K}$ that give the classes of the examples. The least squares\n",
    "objective can be expressed as a matrix norm squared,\n",
    "$$\n",
    "‖X^TΘ− Y ‖^2,\n",
    "$$\n",
    "where $Y$ is the $N × K$ vector with\n",
    "\n",
    "$$\n",
    "Y_{ij} = \n",
    "\\left\\{\n",
    "\\begin{array}{ll}\n",
    " 1 & y^{cl}_i = j \\\\\n",
    "−1 & y^{cl}_i \\neq j.\n",
    "\\end{array}\n",
    "\\right.\n",
    "$$\n",
    "In other words, the rows of Y describe the classes using one-hot encoding, converted\n",
    "from $0/1$ to $−1/+1$ values. The least squares solution is given by $Θ̂=(X^T)^†Y$.\n",
    "\n",
    "Let’s see how to express this in Julia."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6-element Array{Int64,1}:\n",
       " 4\n",
       " 3\n",
       " 2\n",
       " 1\n",
       " 2\n",
       " 3"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function one_hot(ycl,K)\n",
    "N = length(ycl)\n",
    "Y = zeros(N,K)\n",
    "for j in 1:K\n",
    "    Y[findall(ycl .== j), j] .= 1\n",
    "end\n",
    "return Y\n",
    "end;\n",
    "K = 4;\n",
    "ycl = rand(1:K,6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6×4 Array{Float64,2}:\n",
       " 0.0  0.0  0.0  1.0\n",
       " 0.0  0.0  1.0  0.0\n",
       " 0.0  1.0  0.0  0.0\n",
       " 1.0  0.0  0.0  0.0\n",
       " 0.0  1.0  0.0  0.0\n",
       " 0.0  0.0  1.0  0.0"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y = one_hot(ycl, K)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6×4 Array{Float64,2}:\n",
       " -1.0  -1.0  -1.0   1.0\n",
       " -1.0  -1.0   1.0  -1.0\n",
       " -1.0   1.0  -1.0  -1.0\n",
       "  1.0  -1.0  -1.0  -1.0\n",
       " -1.0   1.0  -1.0  -1.0\n",
       " -1.0  -1.0   1.0  -1.0"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "2*Y .- 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the functions we have defined, the matrix least squares multi-class classifier\n",
    "can be computed in a few lines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ls_multiclass (generic function with 1 method)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function ls_multiclass(X,ycl,K)\n",
    "n, N = size(X)\n",
    "Theta = X' \\ (2*one_hot(ycl,K) .- 1)\n",
    "yhat = row_argmax(X'*Theta)\n",
    "return Theta, yhat\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Iris flower classification.** We compute a $3$-class classifier for the iris flower data set. We split the data set of $150$ examples in a training set of $120$ ($40$ per class) and a test set of $30$ ($10$ per class). The code calls the functions we defined above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "D = iris_data();\n",
    "setosa = D[\"setosa\"];\n",
    "versicolor = D[\"versicolor\"];\n",
    "virginica = D[\"virginica\"];\n",
    "# pick three random permutations of 1,..., 50\n",
    "import Random\n",
    "I1 = Random.randperm(50);\n",
    "I2 = Random.randperm(50);\n",
    "I3 = Random.randperm(50);\n",
    "# training set is 40 randomly picked examples per class\n",
    "Xtrain = [ setosa[I1[1:40],:]; \n",
    "    versicolor[I2[1:40],:]; \n",
    "    virginica[I3[1:40],:] ]'; # 4x120 data matrix\n",
    "# add constant feature one\n",
    "Xtrain = [ ones(1,120); Xtrain ]; # 5x120 data matrix\n",
    "ytrain = [ ones(40); 2*ones(40); 3*ones(40) ];\n",
    "# test set is remaining 10 examples for each class\n",
    "Xtest = [ setosa[I1[41:end],:]; \n",
    "    versicolor[I2[41:end],:] \n",
    "    virginica[I3[41:end],:] ]'; # 4x30 data matrix\n",
    "Xtest = [ ones(1,30); Xtest ]; # 5x30 data matrix\n",
    "ytest = [ones(10); 2*ones(10); 3*ones(10)];"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3×3 Array{Float64,2}:\n",
       " 40.0   0.0   0.0\n",
       "  0.0  28.0  12.0\n",
       "  0.0   5.0  35.0"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Theta, yhat = ls_multiclass(Xtrain, ytrain, 3);\n",
    "Ctrain = confusion_matrix(ytrain, yhat, K=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.14166666666666666"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "error_train = error_rate(ytrain, yhat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30-element Array{Int64,1}:\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 1\n",
       " 2\n",
       " 2\n",
       " 2\n",
       " ⋮\n",
       " 2\n",
       " 3\n",
       " 2\n",
       " 3\n",
       " 3\n",
       " 3\n",
       " 3\n",
       " 3\n",
       " 2\n",
       " 3\n",
       " 3\n",
       " 3"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yhat = row_argmax(Xtest'*Theta)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3×3 Array{Float64,2}:\n",
       " 10.0  0.0  0.0\n",
       "  0.0  7.0  3.0\n",
       "  0.0  2.0  8.0"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Ctest = confusion_matrix(ytest, yhat, K=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.16666666666666666"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "error_test = error_rate(ytest, yhat)"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Julia 1.1.1",
   "language": "julia",
   "name": "julia-1.1"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.1.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
